34 research outputs found

    End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

    Get PDF
    Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models.Comment: EMNLP 201

    Attention Strategies for Multi-Source Sequence-to-Sequence Learning

    Get PDF
    Modeling attention in neural multi-source sequence-to-sequence learning remains a relatively unexplored area, despite its usefulness in tasks that incorporate multiple source languages or modalities. We propose two novel approaches to combine the outputs of attention mechanisms over each source sequence, flat and hierarchical. We compare the proposed methods with existing techniques and present results of systematic evaluation of those methods on the WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the proposed methods achieve competitive results on both tasks.Comment: 7 pages; Accepted to ACL 201

    CUNI System for the WMT17 Multimodal Translation Task

    Get PDF
    In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with back-translation. For Task 2 (cross-lingual image captioning), our best submitted system generates an English caption which is then translated by the best system used in Task 1. We also present negative results, which are based on ideas that we believe have potential of making improvements, but did not prove to be useful in our particular setup.Comment: 8 pages; Camera-ready submission to WMT1

    CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval

    Full text link
    We present the Charles University system for the MRL~2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach. We first translate the unlabeled examples into English using a multilingual machine translation model. Then, we run inference on the translated data using a strong task-specific model. Finally, we project the labeled data back into the original language. To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. In both settings, we experiment with finetuning the classification models on the translated data. However, due to a domain mismatch between the development data and the shared task validation and test sets, the finetuned models could not outperform our baselines.Comment: 8 pages, 2 figures; System description paper at the MRL 2023 workshop at EMNLP 202

    CUNI System for the WMT18 Multimodal Translation Task

    Get PDF
    We present our submission to the WMT18 Multimodal Translation Task. The main feature of our submission is applying a self-attentive network instead of a recurrent neural network. We evaluate two methods of incorporating the visual features in the model: first, we include the image representation as another input to the network; second, we train the model to predict the visual features and use it as an auxiliary objective. For our submission, we acquired both textual and multimodal additional data. Both of the proposed methods yield significant improvements over recurrent networks and self-attentive textual baselines.Comment: Published at WMT1

    Input Combination Strategies for Multi-Source Transformer Decoder

    Get PDF
    In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.Comment: Published at WMT1

    CUNI System for the WMT19 Robustness Task

    Get PDF

    Neautoregresivní neuronový strojový překlad

    Get PDF
    In recent years, a number of mehtods for improving the decoding speed of neural machine translation systems have emerged. One of the approaches that pro- poses fundamental changes to the model architecture are non-autoregressive models. In standard autoregressive models, the output token distributions are conditioned on the previously decoded outputs. The conditional dependence al- lows the model to keep track of the state of the decoding process, which improves the fluency of the output. On the other hand, it requires the neural network computation to be run sequentially, and thus it cannot be parallelized. Non- autoregressive models impose conditional independence on the output distri- butions, which means that the decoding process is parallelizable and hence the decoding speed improves. A major drawback of this approach is lower trans- lation quality compared to the autoregressive models. The goal of the non- autoregressive translation research is to find methods that improve the trans- lation quality, while retaining high decoding speed. In this thesis, we explore the research progress so far and identify flaws in the generally accepted eval- uation methodology. We experiement with non-autoregressive models trained with connectionist temporal classification. We find that even though our models...V poslední době nabídl výzkum strojového překladu nové metody pro zrych- lení generování. Jedním z navrhovaných metod je takzvaný neautoregresivní neuronový strojový překlad. V klasických autoregresivních překladových sys- témech jsou výstupní pravděpodobnostní rozdělení modelována podmíněně na předchozích výstupech. Tato závislost umožňuje modelům sledovat stav překlá- dání a obvykle vede ke generování velmi plynulých textů. Autoregresivní postup je však ze své podstaty sekvenční a nelze jej paralelizovat. Neautoregresivní sys- témy modelují pravděpodobnosti jednotlivých cílových slov jako navzájem pod- míněně nezávislé, což znamená, že dekódování lze paralelizovat snadno. Nevýho- dou je ovšem nízká kvalita překladu ve srovnání s modely autoregresivními. Cíl výzkumu neautoregresivních metod strojového překladu je zlepšit kvalitu pře- kladu a zároveň uchovat vysokou rychlost dekódování. Naše práce předkládá re- šerši publikovaných metod a poukazuje na některé nedostatky plynoucí z obecně přijímané evaluační metodologie. Popisujeme experimenty s neautoregresivními modely trénovaných pomocí takzvané " connectionist temporal classification". Z našich výsledků plyne, že i když dosahujeme nejlepších výsledků mezi neautore- gresivními modely na datech z WMT z roku 2014, při porovnání s nejnovějšími...Institute of Formal and Applied LinguisticsÚstav formální a aplikované lingvistikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic
    corecore